GeneZip: A software package for storage-efficient processing of genotype data

نویسندگان

  • Cameron Palmer
  • Itsik Pe'er
چکیده

Genome-wide association studies directly assay 10 single nucleotide polymorphisms (SNPs) across a study cohort. Probabilistic estimation of additional sites by genotype imputation can increase this set of variants by 10to 40-fold. Even with modest sample sizes (10−10), these resulting “imputed” datasets, containing 10 − 10 double-precision values, are incompatible with simultaneous lossless storage in RAM using standard methods. Existing solutions for this problem require compromises in either genotype accuracy or complexity of permissible statistical methods. Here, we present a C/C++ library that dynamically compresses probabilistic genotype data as they are loaded into memory. This method uses a customization of the DEFLATE (gzip) algorithm, and maintains constant-time access to any SNP. Average compression ratios of > 9−fold are observed in test data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Selection of the Best Efficient Method for Natural Gas Storage at High Capacities Using TOPSIS Method

Nowadays one of the most important energy sources is natural gas. By depletion of oil reservoirs in the world, natural gas will emerge as the future energy source for human life. One of the major concerns of gas suppliers is being able to supply this source of energy the entire year. This concern intensifies during more consuming seasons of the year when the demand for natural gas increases, r...

متن کامل

New algorithm for tensor contractions on multi-core CPUs, GPUs, and accelerators enables CCSD and EOM-CCSD calculations with over 1000 basis functions on a single compute node

A new hardware-agnostic contraction algorithm for tensors of arbitrary symmetry and sparsity is presented. The algorithm is implemented as a stand-alone open-source code libxm. This code is also integrated with general tensor library libtensor and with the Q-Chem quantum-chemistry package. An overview of the algorithm, its implementation, and benchmarks are presented. Similarly to other tensor ...

متن کامل

R/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips

UNLABELLED Illumina produces a number of microarray-based technologies for human genotyping. An Infinium BeadChip is a two-color platform that types between 10(5) and 10(6) single nucleotide polymorphisms (SNPs) per sample. Despite being widely used, there is a shortage of open source software to process the raw intensities from this platform into genotype calls. To this end, we have developed ...

متن کامل

The Supertree Toolkit 2: a new and improved software package with a Graphical User Interface for supertree construction

Building large supertrees involves the collection, storage, and processing of thousands of individual phylogenies to create large phylogenies with thousands to tens of thousands of taxa. Such large phylogenies are useful for macroevolutionary studies, comparative biology and in conservation and biodiversity. No easy to use and fully integrated software package currently exists to carry out this...

متن کامل

Effect of Processing Temperature on Storage Quality of In-Shell Hazelnut

Background: Drying is the one of the oldest methods for increasing the shelf life of food products. The objective of the present study was evaluation of effect of different drying temperatures on drying time and storage quality parameters of in-shell hazelnut. Methods: Hazelnuts were dried as a thin layer at three temperatures (40, 50, and 60 °C). The time required for drying and quality param...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013